Maithili Sentence Aligned Speech Corpus (Tirhuta Script)
OverView
41:54:30 hours | 26 GB | 21,412 Audio Segments | 300 speakers The LDC-IL Maithili Sentence Aligned Speech Corpus(Tirhuta Script) dataset comprises audio files in wav format, accompanied by a corresponding textual layer containingYour request cart is empty!
Dataset Description
41:54:30 hours | 26 GB | 21,412 Audio Segments | 300 speakers
The LDC-IL Maithili Sentence Aligned Speech Corpus(Tirhuta Script) dataset comprises audio files in wav format, accompanied by a corresponding textual layer containing
phonetically normalized and orthographically normalized annotations in
Tirhuta Script. This dataset spans a duration of 41:54:30(hh:mm:ss) , consisting of read speech with continuous text, representative sentences, and date formats. The data is derived from 147 female and 153 male native Maithili speakers, encompassing diverse age groups and regions. A comprehensive explanation of dataset can be
found in the The LDC-IL Maithili Sentence Aligned Speech Corpus(Tirhuta Script) Documentation.
For any research-based citations, please use the following citations:
- Dinesh Mishra, Shantanu Kumar, Dr. Narayan Kumar Choudhary, Rajesha N., Prof. Shailendra Mohan. Maithili Sentence Aligned Speech Corpus(Tirhuta Script). Central Instituteof Indian Languages, Mysore. 978-93-48633-51-4
- Rejitha K. S. and Narayan Kumar Choudhary. (ed.). 2025. LDC-IL Corpus Insights. Central Institute of Indian Languages, Mysore. 978-93-48633-33-0
Item specifics
- Authors Dinesh Mishra, Shantanu Kumar, Dr. Narayan Kumar Choudhary, Rajesha N., Prof. Shailendra Mohan.
- Corpus Type Maithili Sentence Aligned Speech Corpus
- Catalogue Number 1507
- ISBN 978-93-48633-51-4
- Data Source On Field
- Duration 41:54:30 hours
- # of Audio Segments 21412
- Release Date 2025/03/20
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.